training-set attack
Understanding Black-box Predictions via Influence Functions
How can we explain the predictions of a black-box model? In this paper, we use influence functions -- a classic technique from robust statistics -- to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. To scale up influence functions to modern machine learning settings, we develop a simple, efficient implementation that requires only oracle access to gradients and Hessian-vector products. We show that even on non-convex and non-differentiable models where the theory breaks down, approximations to influence functions can still provide valuable information. On linear models and convolutional neural networks, we demonstrate that influence functions are useful for multiple purposes: understanding model behavior, debugging models, detecting dataset errors, and even creating visually-indistinguishable training-set attacks.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > New York (0.04)
- (2 more...)
- Health & Medicine (0.68)
- Transportation > Air (0.62)
Using Machine Teaching to Identify Optimal Training-Set Attacks on Machine Learners
Mei, Shike (University of Wisconsin-Madison) | Zhu, Xiaojin (University of Wisconsin-Madison)
We investigate a problem at the intersection of machine learning and security: training-set attacks on machine learners. In such attacks an attacker contaminates the training data so that a specific learning algorithm would produce a model profitable to the attacker. Understanding training-set attacks is important as more intelligent agents (e.g. spam filters and robots) are equipped with learning capability and can potentially be hacked via data they receive from the environment. This paper identifies the optimal training-set attack on a broad family of machine learners. First we show that optimal training-set attack can be formulated as a bilevel optimization problem. Then we show that for machine learners with certain Karush-Kuhn-Tucker conditions we can solve the bilevel problem efficiently using gradient methods on an implicit function. As examples, we demonstrate optimal training-set attacks on Support VectorMachines, logistic regression, and linear regression with extensive experiments. Finally, we discuss potential defenses against such attacks.
- North America > United States > Wisconsin > Dane County > Madison (0.14)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Information Technology > Security & Privacy (0.93)
- Education (0.93)